Baltimore Ceasefire 365 is a city-wide call asking Baltimore residents to avoid having any murders through quarterly Ceasefires and Peace Challenges (February, May, August, and November).
In this post, we use open data and R to look at the distribution of shootings in space and time, and model the impact of the Ceasefires.
Baltimore releases detailed data on issues relevant to the city, including crime. Here’s the raw numbers of shootings, plotted for each day.
library(tidyverse)
library(scales)
library(knitr)
bpd <- read_csv("/Users/peterphalen/Documents/ceasefire/BPD_Part_1_Victim_Based_Crime_Data.csv")
# subset to shootings or homicides with a firearm
bpd <- subset(bpd, Description == "SHOOTING" |
(Description == "HOMICIDE" & Weapon == "FIREARM"))
bpd$CrimeDate <- as.Date(bpd$CrimeDate, format = "%m/%d/%Y")
# there are many crimes per day. collapse to daily counts
daily <- bpd %>% group_by(CrimeDate) %>% summarise(shootings = n())
# fill missing dates, because some had no shootings
full.ts <- data.frame(CrimeDate = seq(daily$CrimeDate[1],
daily$CrimeDate[nrow(daily)], by="day"))
daily <- full_join(full.ts,daily)
daily <- daily %>% group_by(CrimeDate) %>% mutate_all(funs(ifelse(is.na(.),0,.)))
ggplot(daily) +
aes(x=CrimeDate, y=shootings) +
geom_point(alpha=.2) +
xlab("date") +
ylab("daily shooting deaths (vertically jittered)") +
scale_y_continuous(breaks=0:20) +
scale_x_date(labels = date_format("%b %Y")) +
ggtitle(" ",
subtitle="Baltimore (2012-present)")
You can tap neighborhoods to see exact numbers.
This map shows the raw count of murders in Baltimore since 2012 by neighborhood.
library(geojsonio)
library(leaflet)
bpd <- subset(bpd, !is.na(Neighborhood))
# count by neighobrhood
count <- bpd %>%
group_by(Neighborhood) %>%
summarise(total.count=n())
# get polygons to draw neighborhood maps
nbds <- geojsonio::geojson_read("/Users/peterphalen/Documents/ceasefire/Neighborhoods.geojson", what = "sp")
get_shooting_count <- function(neighborhood){
nbd <- as.character(neighborhood)
if(nbd %in% count$Neighborhood){
count <- count[count$Neighborhood == nbd,]$total.count
return(count)
}
if(!(nbd %in% count$Neighborhood)){
return(0)
}
}
nbds$count <- sapply(nbds$Name, get_shooting_count)
# draw legend
range.count <- range(nbds$count,na.rm=T)
labs <- c(0,50,100,150,200,250)
pal.crime <- colorNumeric(colorRamp(c('#ccccff', 'red')), labs)
leaflet(nbds) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addPolygons(stroke=T,
weight=1,
popup=paste0(nbds$Name,"<br/>Shootings: ",nbds$count),
color=~pal.crime(count),
fillOpacity=.5) %>%
addLegend("bottomright",title="# of shootings (2012-present)",colors=~pal.crime(labs),labels=~labs)
This version adjusts by the population of each neighborhood. Be careful about the super bright red areas, because some of them have very few residents and so even a single shooting will make them look really dangerous even though they’re probably not. (You can tap neighborhoods to see the number of residents.)
#--------- population-adjusted --------------#
nbds$per1k <- nbds$count / nbds$Population * 1000
nbds$per1k <- round(nbds$per1k)
nbds$per1k <- ifelse(nbds$Population == 0, NA, nbds$per1k)
labs <- c(0,25,50,75)
pal.crime <- colorNumeric(colorRamp(c('#ccccff', 'red')),
labs,
na.color = "#b2b2b2")
countlabel <- paste0(nbds$Name,"<br/>",nbds$count," shootings among ",nbds$Population," residents")
nbds$countlabel <- ifelse(nbds$Population == 0, paste0(nbds$Name,":<br/>","No residents"), countlabel)
leaflet(nbds) %>% #draw population-adjusted map,
#areas with 0 residents are greyed
#out but can still be clicked
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addPolygons(stroke=T,
weight=1,
popup=nbds$countlabel,
color=~pal.crime(per1k),
fillOpacity=.6) %>%
addLegend("bottomright",title="Shootings per one</br>thousand residents</br>(2012-present)",colors=~pal.crime(labs),labels=~labs)
Ceasefires have been called four times per year since August 2017. These are “ceasefire weekends” but their impact often extends well beyond a few days. One lasted twelve.
# first day (Friday) of ceasefire weekends
ceasefire.initial <-
as.Date(
c("08/04/2017",
"11/03/2017",
"02/02/2018",
"05/11/2018",
"08/03/2018",
"11/02/2018",
"02/01/2019",
"05/10/2019",
"08/02/2019"),
format="%m/%d/%Y")
ceasefire.weekends <-
lapply(ceasefire.initial,
function(x){
seq(from=x,
by="day",
length.out=3)
}
)
ceasefire.weekends <- do.call("c",
ceasefire.weekends)
We want to include information about the date, the day of the week (Mon-Sun), the day of the year (to measure the effects of seasons and “special days” like Christmas), and a binary variable indicating whether a date occurs during ceasefire.
library(lubridate)
# the julian calendar is a simple system for numeric dates
daily$jul <- julian(daily$CrimeDate)
daily$weekday <- factor(weekdays(daily$CrimeDate),
levels=c("Monday","Tuesday","Wednesday","Thursday",
"Friday","Saturday","Sunday"))
daily$seasonal <- yday(daily$CrimeDate)
daily$day.of.year <- date_format("%b-%d")(daily$CrimeDate)
daily$ceasefire <- factor(ifelse(daily$CrimeDate %in% ceasefire.weekends, 1, 0),
labels=c("Regular Day","Ceasefire Weekend"))
We’ll predict shootings using:
We use a Poisson link function because the outcome is a count and the mean is about the same as sd.
library(rstanarm)
model <- stan_gamm4(shootings ~
s(jul) + # spline time trend
s(seasonal, # seasonal effect
bs="cc") + #cyclical constraint
ceasefire, # ceasefire indicator
random= ~ (1 | weekday) + # day of the week
(1 | day.of.year), # every day....
data=daily,
cores=4,
iter=2000,
family=poisson)
Here is a plot of the model against the observations. The red line represents the average. The grey area is the 80% predictive interval, which means that about 80% of days will have shooting counts that fall within the grey area. 10% of the time, the number of shootings will extend above the grey area
Ceasefires are visible as eight dramatic downward red spikes beginning in 2017.
daily$Estimate <- apply(posterior_linpred(model, transform=TRUE),
2, median)
# 80% posterior predictive interval for main plot
preds <- posterior_predict(model, transform=TRUE)
preds <- apply(preds, 2, function(x){quantile(x, prob=c(.1, .9))})
daily$high <- preds["90%",]
daily$low <- preds["10%",]
daily %>%
ggplot(aes(x = CrimeDate, y = shootings)) +
geom_point(alpha=.2) +
geom_line(aes(y = Estimate), alpha=.5, color="red") +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
scale_y_continuous(breaks=c(0,4,8,12)) +
xlab("date") +
theme_bw()
We plot the marginal effects, showing the components that make up the above time series. The specific numbers of shootings are to some extent dependent upon the reference point, but these figures give you the right idea of the shape of the trends.
### Time trend plot
time.frame <- with(daily, # Ref: August
data.frame(
jul=jul,
weekday=0, # not used for this prediction
ceasefire="Regular Day",
day.of.year="Aug-01",
seasonal=yday(as.Date("2018-08-01"))))
post <- posterior_linpred(model,
newdata=time.frame,
transform=TRUE,
re.form = NA)
time.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
time.frame$low <- ci["2.5%",]
time.frame$high <- ci["97.5%",]
trend.axis.dates <- seq(from=as.Date("2012-01-01"),
by="year",
length.out=9)
time.plot <-
time.frame %>%
ggplot() +
aes(x=jul, y=Estimate) +
geom_line(aes(y = Estimate), alpha=.5) +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
xlab("Time trend") +
ylab("Shootings") +
ggtitle(" ") +
scale_x_continuous(
breaks=julian(trend.axis.dates),
labels=date_format("%m-%Y")(trend.axis.dates)) +
theme_bw()
Here’s the deseasonalized time trend we just created in the above block of code. Shootings increased sometime in 2015 and have stayed high.
time.plot
Now we’re going to construct all the seasonal graphs…
### day of year plot
doy.for.single.year <- date_format("%b-%d")(seq(as.Date("0000-01-01"), as.Date("0000-12-31"), by = 1))
exact.day.frame <- with(daily, # Ref: regular day in mid-2018
data.frame(
jul=julian(as.Date("2018-08-01"))[1],
weekday=0, # weekday not used for this prediction
ceasefire="Regular Day",
doy.index = 1:length(doy.for.single.year),
day.of.year=doy.for.single.year,
seasonal=100))
doy.samples <- as.matrix(model, regex_pars = " day.of.year")
doy.effects <- apply(doy.samples,2, median)
# scale to an odds ratio by exponentiating
doy.effects <- exp(doy.effects)
exact.day.frame$Estimate <- as.vector(doy.effects)
# 95% credible intervals (exponentiated)
ci <- exp(apply(doy.samples,2,function(x){quantile(x, prob=c(.025, .975))}))
exact.day.frame$low <- ci["2.5%",]
exact.day.frame$high <- ci["97.5%",]
doy.axis.dates <- seq(as.Date("0-01-01"),by="month",length.out=12)
largest.day <- which.max(exact.day.frame$Estimate)
largest.day.row <- exact.day.frame[largest.day,]
lowest.day <- which.min(exact.day.frame$Estimate)
lowest.day.row <- exact.day.frame[lowest.day,]
xmas.day <- exact.day.frame[which(exact.day.frame$day.of.year=="Dec-25"),]
doy.plot <-
exact.day.frame %>%
ggplot() +
aes(x=doy.index, y=Estimate) +
geom_line(aes(y = Estimate), alpha=.5) +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
xlab("Day of year") +
ylab("Shootings\n(odds ratio)") +
ggtitle(" ") +
scale_x_continuous(
breaks=yday(c(doy.axis.dates, as.Date("0-12-31"))),
labels=date_format("%b-%d")(c(doy.axis.dates, as.Date("0-01-01")))
) +
#highest day
annotate("text", x = largest.day.row$doy.index,
y = largest.day.row$Estimate + .3,
label = paste("largest increase:",largest.day.row$day.of.year)) +
geom_segment(aes(
xend = largest.day.row$doy.index,
x = largest.day.row$doy.index + 5,
yend = largest.day.row$Estimate + .02,
y = largest.day.row$Estimate + .26),
arrow = arrow(length = unit(0.5, "cm"))) +
# lowest day
annotate("text", x = lowest.day.row$doy.index - 20,
y = lowest.day.row$Estimate - .25,
label = paste("largest decrease:",lowest.day.row$day.of.year)) +
geom_segment(aes(
xend = lowest.day.row$doy.index,
x = lowest.day.row$doy.index - 6,
yend = lowest.day.row$Estimate - .02,
y = lowest.day.row$Estimate - .22),
arrow = arrow(length = unit(0.5, "cm"))) +
# christmas day
annotate("text", x = xmas.day$doy.index - 55,
y = xmas.day$Estimate + .33,
label = "Christmas day (Dec 25th)") +
geom_segment(aes(
xend = xmas.day$doy.index - 3,
x = xmas.day$doy.index - 40,
yend = xmas.day$Estimate + .02,
y = xmas.day$Estimate + .28),
arrow = arrow(length = unit(0.5, "cm"))) +
theme_bw()
### seasonal plot
seasonal.frame <- with(daily, # Ref: regular day in mid-2018
data.frame(
jul=julian(as.Date("2018-08-01"))[1],
weekday=0, # weekday not used for this prediction
ceasefire="Regular Day",
day.of.year=0,
seasonal=1:365))
post <- posterior_linpred(model,
newdata=seasonal.frame,
transform=TRUE,
re.form = NA)
seasonal.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
seasonal.frame$low <- ci["2.5%",]
seasonal.frame$high <- ci["97.5%",]
doy.axis.dates <- seq(as.Date("0-01-01"),by="month",length.out=12)
seasonal.plot <-
seasonal.frame %>%
ggplot() +
aes(x=seasonal, y=Estimate) +
geom_line(aes(y = Estimate), alpha=.5) +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
xlab("Seasonal trend") +
ylab("Shootings") +
ggtitle(" ") +
scale_x_continuous(
breaks=yday(c(doy.axis.dates, as.Date("0-12-31"))),
labels=date_format("%b")(c(doy.axis.dates, as.Date("0-01-01")))
) +
theme_bw()
### Day of week plot
wday.frame <- with(daily, # Ref: regular day in August 2018
data.frame(
jul=julian(as.Date("2018-08-01"))[1],
weekday=unique(daily$weekday),
ceasefire="Regular Day",
day.of.year=yday(as.Date("2018-08-01")),
seasonal=yday(as.Date("2018-08-01"))))
post <- posterior_linpred(model,
newdata=wday.frame,
transform=TRUE)
wday.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
wday.frame$low <- ci["2.5%",]
wday.frame$high <- ci["97.5%",]
wday.plot <-
wday.frame %>%
ggplot() +
aes(x=weekday, y=Estimate) +
geom_point(size=2) +
geom_errorbar(aes(ymin=low, ymax=high),
width=.2) +
xlab("Day of week") +
ylab("Shootings") +
ggtitle(" ") +
theme_bw()
Here’s the day-of-year plot we just constructed. We don’t see a lot of evidence for day-of-year effects: the 95% credible intervals for all these days overlap. So we could make an argument for taking it out of the model.
I’ve marked the highest and lowest days (after accounting for slower-moving seasonal trends) on the following figure. I’ve also marked Christmas day, because it was interesting to me.
doy.plot
Here are the 5 days with the lowest numbers of shootings (relative to their placement in the season).
day.effects.sorted <- exact.day.frame[order(exact.day.frame$Estimate),
"day.of.year"]
kable(data.frame(five_lowest_days=head(day.effects.sorted,5)))
| five_lowest_days |
|---|
| Jun-04 |
| Dec-11 |
| Oct-11 |
| Apr-15 |
| Feb-14 |
And here are the 5 days with highest numbers of shootings (relative to their placement in the season).
Christmas day is the second highest day. The other dates are not holidays and I can’t find anything special about them, but maybe there’s something special about them in Baltimore…
Notice that several of the days with the highest number of shootings are from months that also had the lowest number of shootings. That makes me think this could be explained by heteroskedasticity: these months may just have more variability in shootings… Not sure though.
kable(data.frame(five_highest_days=rev(tail(day.effects.sorted,5))))
| five_highest_days |
|---|
| Apr-22 |
| Dec-04 |
| Dec-25 |
| Jan-28 |
| May-28 |
We do see a strong seasonal effect, with summers showing up as particularly bad. We see the largest dips in shootings in February and March, cold and dark months in Baltimore.
seasonal.plot
There is essentially no weekday effect.
wday.plot
Finally, we can use this model to measure the effect of Ceasefires on shootings per day, after accounting for all these trends and seasonalities. The effect of the Ceasefire (plotted here as an odds ratio with 95% credible intervals) is classically statistically significant.
ceasefire.effect <- as.array(model, regex_pars = "ceasefire")
# scale to an odds ratio by exponentiating
ceasefire.effect <- exp(ceasefire.effect)
library(bayesplot)
mcmc_intervals(ceasefire.effect) +
scale_y_discrete(labels="ceasefire effect") +
xlab("odds ratio") +
xlim(c(0,1))
We’re looking at a ~50% reduction in shootings during ceasefire weekends, after accounting for all other effects (e.g., day of the week, holidays, etc).
round(1-median(ceasefire.effect),2)
## [1] 0.51
There’s enough uncertainty in our estimates that the true effect of ceasefire could be anywhere from a ~30% to a 66% reduction.
# 95% credible intervals
round(1-quantile(ceasefire.effect, probs=c(.025,.975)),2)
## 2.5% 97.5%
## 0.66 0.31
We can also use this model to see the impact of the ceasefire at specific points in time. For example, here is the model-estimated impact of the ceasefire on Friday August 2nd, 2019.
### Ceasefire plot
pred.day <- as.Date("2019-08-02")
ceasefire.frame <- with(daily,
data.frame(
jul=julian(pred.day)[1],
weekday="Friday",
ceasefire=factor(c("Regular Day",
"Ceasefire Weekend"),
levels=c("Regular Day",
"Ceasefire Weekend")),
day.of.year=yday(pred.day),
seasonal= yday(pred.day)))
post <- posterior_linpred(model,
newdata=ceasefire.frame,
transform=TRUE)
ceasefire.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.25, .75))})
ceasefire.frame$low <- ci["25%",]
ceasefire.frame$high <- ci["75%",]
# 50% posterior predictive interval for main plot
preds <- posterior_predict(model,
newdata=ceasefire.frame,
transform=TRUE)
ceasefire.frame$high.ppd <- apply(preds,2,function(x){quantile(x, prob=c(.75), na.rm=T)})
ceasefire.frame$low.ppd <- apply(preds,2,function(x){quantile(x, prob=c(.25), na.rm=T)})
ceasefire.frame %>%
ggplot() +
aes(x=ceasefire, y=Estimate) +
geom_point(aes(y = low.ppd), col="blue", shape=95, size=5) +
geom_point(aes(y = high.ppd), col="blue", shape=95, size=5) +
geom_point(aes(y = Estimate),
size=2) +
geom_errorbar(aes(ymin=low, ymax=high),
width=.2) +
xlab("") +
ylab("Shootings") +
ggtitle("Predicted shooting count for Friday August 8, 2019",
subtitle="with 50% credible intervals (black) and posterior predictive intervals (blue)") +
theme_bw()
Without a ceasefire, we’d expect about four people to get shot each day on average. But because this will be a Ceasefire weekend, the model expects about half that many to be killed.
Notice the little blue horizontal lines drawn at 1 and 3 for the ceasefire weekend estimate. Those marks represent the 50% window for the number of shootings you can expect on any given day. The fact that the lower tick mark rests at 1 and 3 means that there will be 0 shootings on a ceasefire day about 25% of the time, and >3 shootings about 25% of the time.
The ceasefire impact is real and at the same time, our model won’t be surprised by the occurrence of several shootings on each day of ceasefire weekend.